1 



Balanced Overlay Networks (BON): An Overlay Technology for 

Decentralized Load Balancing 

Jesse S.A. Bridgewater, Vwani P. Roychowdhury and P. Oscar Boykin, Member, IEEE 



Abstract — We present a novel framework, called bal- 
anced overlay networks (BON), that provides scalable, 
decentralized load balancing for distributed computing 
using large-scale pools of heterogeneous computers. Fun- 
damentally, BON encodes the information about each 
node's available computational resources in the structure 
of the links connecting the nodes in the network. This 
distributed encoding is self-organized, with each node 
managing its in-degree and local connectivity via random- 
walk sampling. Assignment of incoming jobs to nodes with 
the most free resources is also accomplished by sampling 
the nodes via short random walks. Extensive simulations 
show that the resulting highly dynamic and self-organized 
graph structure can efficiently balance computational load 
throughout large-scale networks. These simulations cover a 
wide spectrum of cases, including significant heterogeneity 
in available computing resources and high burstiness in 
incoming load. We provide analytical results that prove 
BON's scalability for truly large-scale networks: in par- 
ticular we show that under certain ideal conditions, the 
network structure converges to Erdos-Renyi (ER) random 
graphs; our simulation results, however, show that the 
algorithm does much better, and the structures seem to 
approach the ideal case of d-regular random graphs. 
We also make a connection between highly-loaded BONs 
and the well-known ball-bin randomized load balancing 
framework. 



I. Introduction 

Distributed computing was one of the earliest 
applications of computer networking and many dif- 
ferent methods have been developed to harness the 
collective resources of networked computers. Some 
important architectures include centralized client- 
server systems, DHT-based systems, and diffusive 
algorithms. Here we introduce the concept of bal- 
anced overlay networks (BON) which takes the 
novel approach of encoding the resource balancing 
algorithm into the evolution of the network's topol- 
ogy. Each node's in-degree is kept proportional to 
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its unused resources by adding and removing edges 
when resources are freed and consumed as depicted 
in Figs. I and II. As we will show, this topology 
makes it possible to efficiently locate nodes with 
the most free resources, which in turn enables load 
balancing with no central server. 

This work makes several novel contributions to 
distributed computing and resource sharing. First, 
BON is decentralized and scalable with known 
lower bounds on balancing performance. While 
other decentralized load-balancing algorithms (e.g., 
Messor; see also Section II for more detailed 
comparisons) have been proposed in the literature, 
performance and scalability analyses for such al- 
gorithms, which guarantee almost-optimal perfor- 
mance as the number of nodes becomes very large, 
have been lacking. Under certain ideal conditions, 
we show that the network structure converges to a 
random graph that is at least as regular and balanced 
as Erdos-Renyi (ER) graphs. Secondly, the algo- 
rithms and protocols for both network maintenance 
and job allocation are based only on local infor- 
mation and actions: each node decides the amount 
of resource or computing power it wants to share, 
and it embeds this information into the network 
structure via short random walks; similarly, jobs 
are distributed based only on information available 
through local explorations of the overlay network. 
Thus, BON is a truly self-organized dynamical 
system. Thirdly, since the BON algorithm produces 
dynamic random graph topologies, these resulting 
networks are very resilient to random damage and 
also have no central point of failure. Finally, we 
make a connection between the performance of 
BON in some regimes with ball-bin random load 
balancing problems [1]. 

It is also important to note that BON is a novel 
paradigm for for resource sharing of any kind and 
its applicability is not limited to only distributed 
computing. The in-degree of a node can be made 
to correspond to any type of shareable resource. 
Then one can exploit the fact that BON networks 
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Fig. 1. New jobs are assigned by using a greedy random walk. The large nodes depict computers in a schematic BON network while the 
small filled nodes are jobs running on the node to which they are connected. The label for each of the computer nodes denotes the current 
number of jobs it is running. Part a) shows new load entering the network. In b) we see that the node where load arrives initiates a random 
walk which keeps track of the degree(free resources) of each visited node. The largest degree node(most free resources) is selected to run 
the new load. To compensate for the additional load, the node which accepted the new load deletes one of its incoming edges to account 
for its diminished resources. The resulting network is depicted in c). 



have low diameters associated with random graphs, 
which makes them easy to sample using short 
random walks. Extensive simulation results support 
the efficacy of this approach in networks with a 
wide range of resource and load distributions. These 
simulations show that the actual performance of the 
algorithm far exceeds the lower bounds mentioned 
above. 

BON is a very simple, realistic and easily im- 
plementable algorithm using standard networking 
protocols. The completely decentralized nature of 
the algorithm makes it very well-suited to massive 
applications encompassing very large ensembles of 
nodes. The following are a few examples of appli- 
cations for which BON is very well suited. 

Single-System Image (SSI) LAN/WAN clusters: 

BON can be used for single-system image (SSI) 
clusters in the same way that Mosix [2] is used but 
without the need for all nodes to be aware of each 
other as is the case in Mosix. This can allow BON 
to scale to very large system sizes. 
Public Resource Computing: BON is also applica- 
ble to @HOME-style projects [3]. This projects are 
typically special purpose for each application. The 
decentralized nature of BON will allow multiple 
projects to share the same pool of computers. 
Grid Computing: BON also has the potential to be 
integrated with GRID [4], [5] systems for efficient 
resource discovery and load distribution across vir- 
tual organizations (VOs). 

Web Mirroring: Distributed web mirroring is an 
example of a non-computational application of the 



BON algorithm. The system could allow a huge 
number of software users to participate in providing 
download mirrors. 

This paper is organized as follows. Section II 
describes prior related load balancing research. Sec- 
tion III introduces the BON architecture. Section IV 
discusses theoretical analysis of BON's scalability. 
Section V provides a description of the simulation 
setup and results. Finally Section VI deals with 
practical considerations for implementing BON. 

II. Related Work 

The authors have previously considered 
topologically-based load balancing with a simpler 
model than BON which is amenable to analytical 
study [6]. In that work each node's resources were 
proportional to in-degree and load was distributed 
by performing a short random walk and migrating 
load to the last node of the walk; this method 
produces Erdos-Renyi (ER) random graphs and 
exhibits good load-balancing performance. As 
we demonstrate in the current work, performing 
more complex functions on the random walk can 
significantly improve performance. 

The majority of distributed computing research 
has focused on central server methods, DHT ar- 
chitectures, agent-based systems, randomized algo- 
rithms and local diffusive techniques [1], [7]— [12] . 
Some of the most successful systems to date [3], 
[13] have used a centralized approach. This can 
be explained by the relatively small scale of the 
networked systems or by special properties of the 
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Fig. 2. When a running job finishes, the host node may need to increase its connectivity to advertise its increased resources. Subpart a) 
shows a job finishing and thus leaving the network. In b) the node where load finishes initiates a random walk. The last node on the walk 
will be the origin of a new edge incident on the walk initiator as seen in c). This new edge represents the increase in available resources on 
the node where the job just completed. 



workload experienced by these systems. However 
since a central server must have O(N) bandwidth 
capacity and CPU power, systems that depend on 
central architectures are unscalable [14], [15]. Re- 
liability is also a concern since a central server is 
a single point of failure. BON addresses both of 
these issues by using 0(logN) maximum commu- 
nications scaling and no single points of failure. 
Furthermore since the networks created by the BON 
algorithm are random graphs, they will be highly 
robust to random failures. 

The Messor project [9] in particular has the same 
goal as BON which is to provide self-organized, 
distributed load balancing. The agent-based design 
of Messor also involves performing random walks 
on a network to distribute load. However BON is de- 
signed specifically to reshape the network structure 
so it can be efficiently sampled. Messor was inspired 
by the notion of a swarm of ants that wander around 
the network picking up and dropping off load. Thus 
it is not clear how long the ant agents will need 
to walk while performing the load balancing. It is 
the focus on topology that distinguishes BON from 
other similar efforts. BON endeavors to reshape 
the network topology to make resource discovery 
feasible with O(logiV) length random walks. A 
simplified version of BON can be analyzed and thus 
we can put performance bounds on its behavior. 
Messor, while very intriguing, provides no analyti- 
cal treatment. 

Within the large body of research some tech- 
niques have been implemented including Mosix, 
Messor, BOINC, Condor, SWORD, Astrolabe, 
INS/Twine, Xenosearch [2], [3], [8], [9], [13], [16]- 



[18] and others. Many of these systems focus on 
providing a specific desired level of service for 
jobs. This contrasts to the approach taken by BON, 
Mosix and others in which processes are migrated 
to nodes where they will have the most resources 
applied to them rather than a specific level of 
resources. The other systems are mostly based on 
DHT architectures and provide for querying based 
on arbitrary node attributes and link qualities. For 
complex distributed applications where each partic- 
ipating node must have a certain level of resources 
and where the connectivity between the nodes must 
have prescribed latencies, these DHT systems will 
be the most suitable platform. For many types 
of parallel scientific computing however, BON's 
objective of placing a job where it will finish as 
quickly as possible is appropriate and desirable. 



BON is designed to be deployed on extremely 
large ensembles of nodes. This is a major similarity 
with BOINC [3]. The Einstein® home project which 
processes gravitation data and Predictor® home 
which studies protein-related disease are based on 
BOINC, the latest infrastructure for creating public- 
resource computing projects. Such projects are 
single-purpose and are designed to handle massive, 
embarrassingly parallel problems with tens or hun- 
dreds of thousands of nodes. BON should scale to 
networks of this scale and beyond while providing 
a dynamic, multi-user environment instead of the 
special purpose environment provided by BOINC. 
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Fig. 3. The relationship between load and node degree is the basis for the BON algorithm; a node with high in-degree is more likely to be 
visited on a random walk and thus more likely to be the recipient of new load than a node with low in-degree. As the total load increases, 
(k) decreases until the load becomes clipped. In the load clipped regime the algorithm remains the same but the mechanism behind the 
performance changes to becomes a ball-bin load balancing problem with log N choices. This change is due to the fact that there is no longer 
a connection between free resources and in-degree. 



III. The BON Architecture 
A. BON Topology 

The concept underlying BON is that the load 
characteristics of a distributed computing system 
can be encoded in the topology of the graph that 
connects the computational nodes. 

In schematic terms, an edge in a BON graph 
represents some unit of unused capacity on the node 
to which the edge points. Consequently when a 
node's resources are being exhausted, its in-degree 
will decline as seen in Fig. I. Conversely when 
a node's available resources are increasing, its in- 
degree will rise as seen in figure II. 

Formally a BON is a dynamic, directed graph, 
D = (E,V), where each node Vi E V maintains 
fcimm) <- < fc( max ) incoming edges. The 

maximum incoming edges that a node can have, 
k^ max \ is proportional to the computational power 
of Vi. Each node, vu has a scalar metric Si(t) 
which is kept inversely proportional to fcj(i). As 
Si(t) changes with time, Vi severs or acquires new 
incoming links to maintain the relationship. In the 
context of distributed computing, Sj(£) is a scalar 
representation of the current load experienced by 
node Vi. This means that each node will endeavor to 
keep its in-degree proportional to its free resources 
or inversely proportional to its load. Idle nodes will 
have a relatively large in-degree while overloaded 
nodes will have a small in-degree. The total un- 
loaded resources of a node are proportional to it's 



j ; [max) 

maximum in-degree, k\ ' . 

B. BON Algorithm 

Each node's load, Sj(i), can change as new load 
arrives in the network or when existing work is 
done. When new load arrives at Vi, a short random 
walk is initiated to locate a suitable execution site. 
Contained in this random walk is a BON resource 
discovery message(BRDM) which stores the merit 
function information for the most capable node 
visited so far on the walk. The fact that random 
walks will preferentially sample nodes with high- 
degree motivates the mapping of node in-degree to 
free resources. The simplest approach to choosing a 
node on the walk is to select the last node inserted 
into the BRDM. This case has been previously 
explored [6]. While this simple approach can be 
studied analytically, simulation results indicate that 
large improvements to the balancing performance 
are possible by always keeping the most capable 
node's information. 

Instead of performing a simple random walk and 
selecting the last node to receive incoming load, the 
node on the walk with the largest power per load 
will be the target (see Algorithm 1). Due to the 
mapping between load and in-degree this greedy 
random walk selects the least loaded node on the 
walk to receive new load which is the same as 
choosing the highest degree node when the network 
is not load clipped. This clipping occurs when a 
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node has the minimum allowable in-degree. We 
will discuss the case when the network is load 
clipped in Section V-D. For time sharing systems 
the concept of overloading is not well-defined since 
a node with L jobs will apply l/L of its resources to 
each job. In the context of the BON algorithm load 
clipping simply means that nodes have the minimum 
allowable in-degree and thus are no longer balancing 
load based on preferential sampling. In practice a 
node in the clipped regime will be under very heavy 
computational load, but fundamentally it can still 
accept new jobs. 

Algorithm 1 PickTarget(source): A new job en- 
tering the network initiates a random walk that 
maintain information about the node on the walk 
where the job would run the most quickly. The 
job is then assigned to that node when the walk 
is complete. 
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IV. Analysis 

The performance of BON walk selection will be 
bounded below by the performance of the stan- 
dard walk selection. Therefore although we do not 
present a calculation of the load distribution for 
BON graphs, we can state that it has the same scal- 
ability as the standard walk case described below. 

The exact BON algorithm is difficult to analyze, 
however it is possible to place bounds on the balanc- 
ing performance by simplifying the load distribution 
protocol. We also calculate the bandwidth used by 
the algorithm and compare it to a centralized model. 

A. Scalability 

The BON algorithm is difficult to study analyt- 
ically due to the way in which the random walks 



are sampled. However prior results [6] show that a 
modified BON is more amenable to analysis. 

Rather than selecting the node on the BRDM 
walk that can process an incoming job the fastest, 
one can simply select the last node of the walk. In 
this model the average number of absent edges, J, in 
the A^-node graph is identified as the total number 
of jobs running. The maximum number of incoming 
edges that a node can have will be called C and 
the number of incoming edges to a given node is 
denoted as i. For the case when the average number 
of jobs remains constant we can describe this system 
as a simple Markov process with state-dependent 
arrival and service rates; it can be denoted by the 
standard queueing notation as M/M/oo/ /M. The 
arrival rate of new jobs is proportional to the free 
resources,i/ (NC—J), of each node since jobs arrive 
preferentially based on in-degree. Assuming that 
jobs terminate uniformly randomly, the departure 
rate is (C—i) / J. Solving the birth and death Markov 
process we obtain for the degree distribution: 
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Defining the normalized load as a = J/NC, 
the binomial distribution means that for each node, 
each unit of capacity is occupied with probability 
a. If C = N — 1, this model recovers the degree 
distribution for ER graphs: 
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Where E = N(N - 1) - J. 

For a non-clipped network with uniform resource 
distribution, the variance of the degree distribution 
maps directly onto the variance of the load balanc- 
ing. This is because each incoming edge represents 
free resources. In a perfectly balanced network, each 
node will have the same free resources. This ideally 
balanced network would be a regular graph and 
thus the variance of the degree distribution would 
be zero. For the simple case mentioned above the 
degree distribution is binomial and thus it has a 
small but non-vanishing variance. 

When the highest-degree (most free resources) 
node on a random walk is selected to receive 
incoming load, that node's resources must be greater 
than or equal to the resource of the last node on the 
walk. 



6 



In addition to this queueing model, it has been 
shown by information theoretic arguments [6] that 
the simplified rewiring protocol described here cre- 
ates ER random graphs. 

B. Communications Complexity 

An important metric of performance for dis- 
tributed computing is the network bandwidth re- 
quired for a protocol. It is clear that the architecture 
that requires the least total bandwidth is a central 
server. However the maximum bandwidth that any 
node must consume in a central system will not 
be the least. And while the total consumption of 
bandwidth is important, the bandwidth that any 
single node consumes can be a significant bottleneck 
for large central networks. Below we compare the 
bandwidth required by a centralized algorithm and 
by BON. 

1 ) Centralized: The simplest non-trivial central- 
ized architecture for a computing network is the 
case where initially the central node, denoted C, 
knows the power and load of each of the iV nodes 
that it controls. When a job on one of the nodes 
completes, that node will notify C so that it can 
update its load state information for the network. 
Obviously C keeps track of assignments of new 
load to each of the nodes. This method does away 
with the need to periodically probe every node in 
the network, however it is clear that the bandwidth, 
memory and CPU cost that C has to bear is still 
O(N). Further assume a steady-state network load 
and that in every time unit, N/3, jobs begin and the 
same number terminate. We further assume that all 
the jobs will start at one of the computational nodes 
and that they will then be sent to C for assignment. 
Now assume that for every job that is started a 
relatively large A-byte packet, including the size of 
the program code and input data, must be sent from 
C to iVj, % e {1, • • • , N} and that relatively small 
L-byte packets must be sent to the central server in 
response to changes in load. Therefore C must send 
N(3A bytes per unit time which consumes kernel 
resources and requires bandwidth that increases with 
N. The total bandwidth consumed by the entire 
centralized network is 



BP = N(3 [A + L] 



(3) 



This is also the same amount that C must con- 
sume since it is involved in every communications 



round. For N u Vi G {1, • • • , N} the bandwidth 
consumed will be Bf = (3 [A + L\ which is 0(1). 

2) BON: For the decentralized BON algorithm, 
the network topology is now more complex than 
for the central server. While the graph of the central 
model was a star, BON will look approximately like 
a random regular graph. Initially we will assume 
that we begin with a correctly-formed BON. As 
with the central model we assume that N/3 jobs 
begin and end at random nodes in each time unit. 
Since there is not a central server, each node that 
initiates a new job must send a BRDM to find 
a node to run the job. Every node on the walk 
will need to replace the value of obj = i ' 

(L bytes of data), in the BRDM if obj is larger 
than the objective function that is currently in the 
BRDM (see Algorithm 1. Since the random walk 
will be O(logiV) steps long, the total bandwidth 
of the walk will be B w = L log N. Likewise when 
a job finishes, another walk will be used to find 
a replacement for the removed edge. Factoring in 
the cost of transmitting the program to the target 
node and needed handshaking protocols, the total 
bandwidth consumed by BON is 

b (bon) = N(3 { A + L {log N + 2 }]. (4) 

Therefore we can see that the total bandwidth 
cost of BON is O(logiV) greater than the cen- 
tral model. However a more important metric in 
many situations will be the maximum bandwidth 
consumed by any of the nodes. In BON each node 
will on consume bandwidth in proportion to how 
many jobs it initiates and how powerful it is. Thus 
if all of the nodes use the network equally then each 
node will consume B^ ON ^ /N bandwidth, which is 
logarithmic in the size of the network. This contrasts 
to the 0(N) bandwidth needs of the central server. 

V. Simulations 

A. Simulation Description 

For the simulations, each node in a BON network 
is a computer with power equal to its maximum 
degree minus its minimum degree, Pi = k^ riax " > — 
fc( mm ). One unit of power can process a unit of load 
in each unit of time. Jobs run on these computers 
in a time-sharing fashion with each of the L jobs 
of a computer equally sharing the node's power at 
each time step. The simulations deal only with CPU 
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power as the objective function of the balancing. 
Other features such as memory ushering will not 
be simulated but will be added as features in the 
reference implementation. Simulations of the BON 
system were performed using the Netmodeler pack- 
age. Two type of experiments were done. 

The first experiments are very idealized using 
uniform node power, uniform job arrival rates and 
Poisson-distributed job sizes. Equation 5 indicates 
that all nodes have fc( maa; ) = 71, the size of each 
job is Poisson distributed and that at each time step 
(3 jobs are created. For different simulations [3 and 
v will have different values in order to show a wide 
range of system behavior. While this setup is very 
idealized, it might apply to cluster computing. 
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Fig. 4. BON graphs obey the random graph scaling relationship be- 
tween diameter and average degree: Diameter oc In TV/ ln(fc). This 
BON graph was generated using the uniform simulation parameters 
from Eqn. 5. 
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The second type of simulation (Eqn. 6) uses 
power-law distributions for all parameters includ- 
ing job arrival rate, node power and job size. 
This configuration represents a situation where ev- 
ery important system parameter is distributed in a 
bursty, heavy-tailed way. Heavy-tailed distributions 
are common in many real systems [19] includ- 
ing networks and thus these simulations provide a 
fairly realistic idea of how the system will perform 
under real loads. Most importantly for simulation 
performance the computing power ranges from 1 
unit of power to 300 units of power. This is at 
least ten times the range of performance seen in 
commonly used CPUs. As we will revisit in the 
performance evaluation, having many nodes that can 
only accept a few processes prior to being load- 
clipped will impact the balance distribution simply 
due to quantization effects. This issue will have 
design implications for the implementation. 

P k oc k~\ k {min) + 1 < k < 304, 

Pj oc j -1 , 32 < j < 1024, 

P b oc b-\ l<b<b max (6) 

In all of these simulations we begin with a 
randomly-connected network subject to the initial 
degree distribution. However if one starts with a 



completely ordered network with O(N) diameter, 
the graph will quickly converge to the low-diameter 
structure depicted in these simulations. 

B. Graph Structure 

The idea at the heart of BON is that the graph 
structure can capture the load state of a computa- 
tional network. In section IV we discussed prior 
theory results that describe the structure of graphs 
formed using algorithms similar to BON. We now 
present simulation results for both the uniform and 
heavy-tailed systems described above. The degree 
distribution for a balanced overlay network matches 
the resources of the constituent nodes for both 
uniform and power-law resource distributions as 
seen in Figs. 1 1 and 12. 

Figs. 4 and 5 show that BONs maintain a low 
diameter and exhibit the property of random graphs 
that the diameter is proportional to In N/ h\(k). The 
changes in connectivity can be seen in Fig. 6. 

It is important that BON graphs remain at least 
weakly-connected as they evolve. All simulations 
indicate that BONs remain weakly-connected, but 
that when they are load-clipped they can acquire a 
complex strongly-connected structure. As the load 
surpasses 1 (the clipping threshold) and the network 
becomes a k ( mm > -regular graph, the number of 
strongly-connected components (SCC) increases. As 
shown in Fig. 7, the number of SCCs falls back 
to unity when an overloaded network becomes less 
loaded. It is also important to note that the SCCs in 
an overloaded BON can change due to rewiring of 
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the network. So while every node will not be able to 
communicate with every other node at each instant 
of time, the out-component of each node in the 
graph can change with time. Also the network does 
remain weakly-connected even when the network 
has many SCCs. 

C. Load Balancing Performance 

When discussing load balancing performance we 
want metrics which measure how closely load fol- 
lows capacity. When all the nodes are equally capa- 
ble, standard deviation is a convenient measure of 
balancing, when nodes are heterogeneous, correla- 
tion coefficient is what we use. 

1) Simple Idealized System: For the uniform 
simulation model, Fig. 8 shows that the ensemble 
standard deviation of the node load is just below 
1% when the network is in the under-loaded regime. 
When the network is clipped the standard deviation 
of the load is slightly higher than in the under- 
loaded regime but still quite close to 1%. This dif- 
ference in performance is likely due to the transition 
from the degree-correlated load-balancing that is in 
effect when the network is under-loaded to the ball- 
bin load balancing that takes over when the network 
is clipped. 

Another important measure of performance is 
how well BON performs in comparison to a central 
system that places new jobs at the least loaded node 
in the network. In the uniform configuration after 
1000 iterations the central system has completed 
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Fig. 5. One can see that this system obeys the random graph scaling 
relationship between diameter and average degree: Diameter oc 
hxN/ln{k). This BON graph was generated using the power-law 
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Fig. 6. As a BON network evolves there is significant turnover in 
connections. In the top sub-figure the in-degrees of an arbitrary node, 
vo, are depicted as a function of time. A vertical line, t = a, intersects 
zero or more points, Ao, which is the set of nodes that have directed 
edges incident to vo. This illustrates that the structure of the graph 
changes significantly even when the macroscopic properties such as 
average degree are not changing. 



501314 jobs compared with 501238 jobs being 
completed by BON. This indicates that BON's job 
throughput is only about 0.01% worse than the 
optimal schedule. 

2) Power-Law System: The power-law simula- 
tions illustrate an important design criterion for 
practical implementations. For these simulations the 
power distribution of the nodes is a power-law 
given in Eqn. 6. The minimum power is 1 and the 
maximum power is 300. Therefore there are many 
nodes that have very low power resources. This 
means that for many values of the load it will be 
impossible to get close to optimal balancing. For 
this reason the correlation between degree and free 
resources is used to evaluate performance as shown 
in Fig. 10. A good example is a node with P = 2. 
Because the load is defined to be P/L, where L is 
the number of running jobs, the load is limited to be 
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Fig. 7. In simulation we observe that BONs are always at least 
weakly connected directed random graphs. This 2048-node BON 
with heavy-tailed parameters remains weakly connected even in the 
clipped regime. The number of strongly-connected components(SCC) 
does increase under heavy load, but the number of SCCs returns to 
unity when the clipping condition passes. 



non-negative integer multiples of 1/2. Thus if the 
network is 75% loaded then this low-powered node 
is equally unbalanced whether one or two jobs are 
running. By selecting a suitable minimum power, 
one can bound this finite size effect. For example 
if the least powerful node has P = 5 then it can 
always get within 10% of the optimal value. This 
finite size effect appears as cyclical behavior of the 
load standard deviation and can clearly be seen in 
Fig. 9. 

As was done for the uniform simulation configu- 
ration, we compare centrally scheduled job through- 
put to BON throughput with the same load trace. 
In the heavy-tailed configuration, after 1000 itera- 
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Fig. 8. This uniform resource, 2048-node BON under increasing 
overload also shows that the standard deviation of the load is low 
at about 1%. The difference in performance as the network enters 
the clipped regime can also be seen. At the clipping transition point 
the standard deviation experiences a spike which is likely due to 
jobs accumulating in a small SCC before additional rewiring can 
rebalance the load. After a short time this load imbalance dissipates 
as the rewiring allows load to be distributed throughout the network. 



tions the central system has completed 585872 jobs 
compared with 585788 jobs being completed by 
BON. As with the uniform configuration, BON's 
job throughput in the heavy-tailed configuration is 
only about 0.01% worse than the optimal schedule. 
Please note that this result ignores the effects of 
job distribution latency on total throughput but it 
indicates that job placement is very close to optimal 
when communications delays are ignored. 

D. Ball-Bin Regime 

Every node in the graph must maintain a mini- 
mum degree to ensure that the graph stays at least 
weakly-connected. For these experiments each node 
maintains at least 4 incoming edges which means 
that if the network's load becomes clipped then there 
is no longer a correlation between a node's degree 
and its resources. For this reason the real metric that 
is sampled on the walk is the amount of computing 
power that the next incoming process can expect on 
a given node. When the network is not clipped this 
is the same as choosing the highest-degree node on 
the walk. However for a clipped network it selects 
the node on the walk that has the largest value of the 
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Fig. 10. For the case where we do have many nodes with a low maximum degree the correlation between node power in-degree is a more 
appropriate measure of performance than standard deviation, and load. For a network that is getting increasingly loaded the load vs. power 
is plotted at instants and the correlation is also calculated. Even when the network is 3X the load clipping threshold, r 2 > 0.99. 
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Fig. 11. When all nodes have the same resources, BONs will be approximately regular graphs. These degree distribution snapshots (top 
graph) of an evolving network that is getting progressively more loaded show the regular nature of the graph over a wide range of load 
conditions. 



expected power for the next incoming job as shown 
in algorithm 1 . Now consider that a clipped network 
is approximately a regular random graph and thus 
a short random walk will sample uniformly from 
the nodes in the network. This problem now shows 
itself to be very similar to ball-bin load balancing 
[1], PO]. 



{(k) > k^ mm \ preferential sampling 
(k) = k( min \ ball-bin sampling " ( } 

In ball-bin systems a ball is uniformly randomly 
assigned to one of iV bins. As this process is 
repeated a distribution of bin population emerges 
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Fig. 9. Here we see the cyclical nature of the load standard deviation 
as a function of network load. The dips happen when the network 
is at an integer multiple of the load clipping threshold. This is due 
to the assumption in this model that most of the nodes have a very 
low degree(power-law distribution). For instance if the network is 
x% loaded a node with a maximum in-degree of k m in + ki oac i will 
only be able to have the same load as the rest of the network if 
-j^— = hoad, for allJ G {0, • • • , ki oad o}, o £ Z + . 



and has been studied extensively under many kinds 
of assumptions. The important result from ball-bin 
systems is that if one probes the population of more 
than one bin prior to assigning a ball, the population 
of the most full bin will be reduced exponentially 
in N. This work is often referred to as the "power 
of 2 choices" [1]. 

In the load-clipped regime we have a similar 
situation where instead of two choices we have 
the power of logiV choices. Each random walk 
on the k (jnm ^ -regular graph will sample uniformly 
randomly from the nodes. Then the least loaded 
nodes from the logiV choices will be the target to 
accept the new load. This connection is made to give 
intuition for why BON should function in the over- 
loaded regime but we will not examine this aspect 
of the system in detail here. Detailed followup work 
will be performed to compare overloaded BON 
performance to the theoretical predictions of ball- 
bin systems. 

VI. Practical Considerations 

1) Network: In the presented simulations some net- 
works have nodes with hundreds of incoming edges. 
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Fig. 12. For a network with a power-law resource distribution, 
the degree distribution also has a power-law form with the exponent 
getting more negative with increasing load. 



While modern computers can easily maintain hun- 
dreds of simultaneous TCP connections, for BON 
it may be preferable to use UDP for some aspects 
of the network. BON nodes will interact with edges 
when load is distributed using random walks and 
when edges are being created or destroyed. These 
edges are important because they maintain the state 
of the network, but if a connection goes down it can 
easily be replaced without affecting system perfor- 
mance. However when load distribution messages 
are random walking through the network, reliable 
transmission is important. There are numerous ways 
to provide for reliable communications between 
nodes using both TCP and UDP. Efforts to use 
fast light-weight protocols while minimizing latency 
will be important design issues for a BON imple- 
mentation. Most connections at any given time will 
not be transmitting BRDMs but will be maintaining 
the network state. For state maintenance the use of 
UDP will drastically reduce overhead compared to 
TCP and will allow a much larger number of edges 
to be maintained with less overhead that TCP. Using 
soft state information from packet traffic to perform 
keep-alive operations will help mitigate connection 
maintenance overhead. 

2) State Encoding: For the load objective function 
we will follow a similar approach to the Mosix SSI 
cluster computing system [2]. The Mosix migra- 
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tion algorithm is heuristic in nature and basically 
attempts to run processes where they will finish 
the most quickly. Various historical data about the 
process execution and node load and resources are 
used to judge which node can process a job with 
the least cost. Additionally Mosix uses a memory 
ushering protocol to migrate processes away from 
nodes with depleted memory resources. This ush- 
ering is done in favor of trying to integrate the 
memory and CPU metrics into a single scalar value. 
These methods have been motivated by real system 
profiling and have proved to be successful. Thus the 
node resource that will be kept proportional to in- 
degree is the available CPU resources of the node. 
In particular we wish for new load to be assigned 

to the node v it where i = argmaxj j^q^-j. Here 
Pj is v/s power which can be any standardized 
way of representing the number of operations per 
unit time that a node can perform and Lj is the 
number of processes competing for Pj (UNIX load). 
The details of how to weight integer, floating- 
point and other processing characteristics will not 
be considered here but it will be assumed that a 
reasonable benchmark of CPU performance can be 
constructed and run periodically on each BON node. 

3) Load Quantization: Since computing power is 
represented by the edges in the network, it is impor- 
tant to scale the power that each edge represents in 
order to get the most load balancing performance 
for the least bandwidth and state maintenance. 
The initial implementation of BON will specify a 
computer to be the baseline of computer power. 
As computer performance changes, adaptive base- 
lining can be performed to automatically scale how 
much computing power is represented by a BON 
edge. All other node powers are computed w.r.t. the 
kth percentile of benchmarks. That is all nodes in 
the kth percentile will have the baseline power of P b 
and will maintain at most k^ — k min = 5 baseline 
resource edges. All other nodes will maintain 

k% — k m i n = — k^ (8) 

resource edges. Choosing k^™ "^ — k min > 5, Mi 
ensures that even the least powerful nodes in the 
network can have a load that is within 10% of 
optimally balanced. 



VII. Conclusion 

Balanced overlay networks (BON) is a novel 
decentralized load-balancing approach that encodes 
the balancing algorithm in the evolving structure of 
the graph that connects the resource-bearing nodes. 
BON is scalable, self-organized and relies only local 
information to make job assitgnment decisions. New 
jobs are assigned to a node by a random walk 
on the graph which not only samples the graph 
preferentially, but also selects the highest-degree 
node that was visited on the walk. Each node's 
unused resources are proportional to its degree so 
this approach works very well when a network is 
not loaded beyond its clipping point. When a BON 
is clipped the relationship between load and in- 
degree breaks down but the balancing performance 
remains quite good due to the so-called "power 
of two choices" in ball-bin load balancing. Based 
on previous theoretical results and extensive sim- 
ulation results, BON is seen to be efficient and 
practical. Further ongoing work on this problem 
includes geographical awareness extensions using 
more complex walk objective functions, a reference 
implementation on PlanetLab, theoretical analysis 
of the random walk with greedy node selection, 
algorithmic optimizations and a full comparison of 
overloaded regime results with the predictions of 
ball-bin random load-balancing. Finally it should be 
noted that this is only one possible way to encode 
information about a network in its topology; other 
distributed algorithms may benefit from using graph 
state to bias node selection. 
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